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Abstract. The Boolean lattice of logical statements induces the free distributive lattice of questions. 
Inclusion on this lattice is based on whether one question answers another. Generalizing the zeta 
function of the question lattice leads to a valuation called relevance or bearing, which is a measure 
of the degree to which one question answers another. Richard Cox conjectured that this degree can 
be expressed as a generalized entropy. With the assistance of yet another important result from 
Janos Aczel, I show that this is indeed the case, and that the resulting inquiry calculus is a natural 
generalization of information theory. This approach provides a new perspective on the Principle of 
Maximum Entropy. 



"A wise man's question contains half the answer." Solomon Ibn Gabirol (1021-1058) 



Questions and answers, the unknown and the known, empty and full are all examples 
of duality. In this paper, I will show that a precise understanding of the duality of ques- 
tions and answers allows one to determine the unique functional form of the relevance 
measure on questions that is consistent with the probability measure on the set of logical 
statements that form their answers. Much of the material presented in this paper relies 
on fundamental background material that I regrettably cannot take the space to address. 
While I provide a brief background below, I recommend the following previous papers 
[1, 2, 3, 4] in which more background, along with useful references, can be found. 



A partially ordered set, or poset for short, is a set of elements ordered according to 
a binary ordering relation, generically written <. One element b is said to 'include' 
another element a when a <b. Inclusion on the poset is encoded by the zeta function 



If there is a greatest element in the poset, it is called the top T, and dually, there may be 
a bottom ±. Given two elements a and b of the poset, their upper bound is the set of all 
elements x, such that a <x and b < x, where x ^ a and x b. If there exists a unique 



QUESTIONS AND ANSWERS 



LATTICES AND VALUATIONS 




{zeta function) 



(1) 



least upper bound, this element is called the join of a and b, written aV b. Similarly, 
if there exists a greatest lower bound, that element is called the meet, written af\b. A 
lattice is a poset in which unique joins and meets of all pairs of elements exist. In this 
case, the join and meet can be seen as binary operations that take two objects and map 
them to a third. For this reason, lattices are algebras. When one views the lattice as a set 
of elements ordered by an ordering relation, one is taking a structural viewpoint. When 
it is viewed as a set of elements and a set of operations, one is taking an operational 
viewpoint, which is an algebra. Last, elements that cannot be expressed as a join of two 
other elements are called join-irreducible elements. 

Generalizing the zeta function allows one to define degrees of inclusion. In actuality, 
it is more useful to generalize the dual of the zeta function [4], which is the zeta function 
(1) with the conditions flipped around. The result is a real-valued function that captures 
the notion of degrees of inclusion 

{1 if x>y 
if JcAy = _L {degrees of inclusion) (2) 

z otherwise, where < z < 1 . 

The rules by which degrees of inclusion are manipulated as one moves about the 
lattice are found by maintaining consistency with the lattice structure, or equivalently 
the underlying algebra. All lattices are associative, and as such, they all possess a sum 
rule. With an important result from Caticha [5], I have shown that all distributive lattices 
give rise [3, 4] to a sum rule, 

z{xi\/X2\/ ••■\/Xn,t)^Y^z{xi,t)-Y^z{xiAxj,t)+ ^ z{xi AXj AXk.t) (3) 

i i< j i< j<k 

a product rule 

z{xAy,t) = Cz{x,t)z{y,xAt), (4) 

and a Bayes' Theorem 

zMz^^^ (5) 
zix,t) 

This immediately conjures up thoughts of probability theory, however this result is 
surprisingly far more general. 



PROBABILITY 

It is now well-understood that probability theory is literally an extension of logic. A 
set of logical statements ordered by implication gives rise to a Boolean lattice, which is 
equivalently a Boolean algebra. Figure 1 shows the lattice generated from a set of 

three mutually exclusive and exhaustive assertions: a, k, and n. This example is taken 
from two previous papers, and deals with the issue of 'Who stole the tarts?' in Lewis 
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FIGURE 1. The ordered set of down-sets of the lattice of assertions A results in the corresponding 
lattice of questions Q = 0{A) ordered by C. ^ is dual to Q in the sense of Birkhoff's Representation 
Theorem. The join-irreducible elements of Q are the ideal questions J, which are isomorphic to the lattice 
yi ~ J = 3(Q). Down-sets corresponding to several questions are illustrated on the right. 



Carroll's Alice in Wonderland, specifically 

a = 'Alice stole the tarts!' 

k = 'The Knave of Hearts stole the tarts!' 

n = 'No one stole the tarts! ' 

The lattice shows all possible statements that can be formed from these three atoms. 

The zeta function of this lattice, which is a function of two statements, indicates 
whether one statement implies another. Generalizing the dual of the zeta function results 
in a bi-valuation p{x\y) = z{x,y) that follows a sum rule, a product rule and a Bayes' 
theorem. This bi-valuation is a measure that quantifies the degree to which one statement 
implies another, and is essentially the degree of implication that Cox considered in his 
seminal work [6, 7]. Thus order theory gives rise to probability theory [3, 4]. 

Probability theory, however, does not instruct us on how to assign priors (which can 
be considered as valuations, Qg. p{x\T) =v{x)). An important theorem by Gian-Carlo 
Rota [8, Theorem 1, Corollary 2, p.35] makes this fact clear: 

Theorem 1 A valuation v in a finite distributive lattice L is uniquely determined by the 
values it takes on the set of join-irreducibles of L, and these values can be arbitrarily 
assigned. 

There is no information in the Boolean algebra, and hence the inferential calculus 
(probability theory), to instruct us in assigning priors. We must instead rely on addi- 
tional principles, such as symmetry, constraints, and consistency with other aspects of 
the problem to assign priors. However, once the priors are assigned, order-theoretic prin- 
ciples dictate the remaining probabilities through the inferential calculus. 



RELEVANCY 



Cox defined a question as the set of all possible logical statements that answer it [9]. 
With questions being described by sets, their natural algebra is the distributive algebra, 
with the join V and meet A identified with the set union U and set intersection fl, 
respectively. The natural ordering relation among questions is the relation 'answers', 
which can be represented mathematically by C. Thus questions possess two algebraic 
operations V and A analogous to the familiar disjunction and conjunction of logical 
statements. In fact, we even use the words 'or' and 'and' to describe them in spoken 
language. However, the algebra is not Boolean as Cox surmised, since the definition 
of a question is rather restrictive (i.e. not all sets of logical statements correspond to 
questions). Thus questions do not, in general, have complements [1]. 

In order theory. Cox's definition of a question is equivalent to saying that a question is 
a down-set, where a down-set is the set of all poset elements that contain every element 
that includes any other element of the set [10, 4] 

Definition 1 (Down-set) A down-set is a subset J of an ordered set L, written J — [L, 
where if a & J, x G L, x < a then x e /. 

where J is the question, L is the ordered set of logical statements, and < is 'implies' — 
The question lattice is then formed by taking the set of down-sets of the assertion lattice 
and ordering them according to C. This operation, called the ordered set of down-sets 0, 
takes the assertion lattice to the question lattice, Q = 0(^1) . Figure 1 shows the lattice Q3 
generated from A^. The lattice Q3 depicts all possible questions that can be asked in this 
example. Note that A = ja, AN = iaV n, AKN = ja V V n, and ANyAK = ANUAK. 
The lattice A is dual to the lattice Q in the sense of Birkhoff's Representation Theorem 
[10, 1], which relates a distributive lattice to its ordered set of down-sets. 

The ideal questions J, are the set of join-irreducible elements of Q. They are not 
practical questions, but are useful mathematical constructs since they are isomorphic to 
the assertion lattice. The real questions 31 are the set of all questions that can be answered 
by each of the atomic statements a, k, or n. The question / =AV^VA^isa special real 
question that I call the central issue [4]. It is the unique real question that answers all the 
others. In this example, it asks 'Precisely who stole the tarts?' 

Just as in the lattice of assertions, we can define the degree to which one question 
answers another by d{X\Y) = z{X,Y). Since the lattice of questions is distributive, there 
exists a sum rule, a product rule, and a Bayes' theorem. This degree is called relevance, 
and due to the duality between A and Q it is entirely reasonable to expect that relevance 
on Q is related to probability on A. We explore this in the next section. 



CONSISTENCY BETWEEN PROBABILITY AND RELEVANCY 

The sum, product and Bayes' rules ensure consistency within the assertion and question 
lattices, however our assignments of probabilities and relevances must also be mutually 
consistent with one another. Rota's theorem assures that we need only to determine the 
relevances of the join-irreducible questions; the rest follow from the inquiry calculus. 



In this section, I show that the form of the relevance is uniquely determined by 
requiring consistency between the probability measure defined on the assertion lattice 
A and the relevance measure defined on its isomorphic counterpart, the lattice of ideal 
questions J. This demonstration requires but a single assumption: the degree to which 
the top question T answers an ideal question X depends only on the probability of the 
assertion x from which the question X originated. That is, given the ideal question X — 

d{X\T)=H{p{x\T)), (6) 

where // is a function to be determined. 

There are four important consistency requirements imposed by the lattice structure 
and the induced calculus. First, the sum rule (3) for questions demands that given three 
questions X,Y,Qe Q the relevance is additive only when X AY = ± 

d{XyY\Q)^d{X\Q) + d{Y\Q), iff X AY ^ ±. (additivity) (7) 

and is subadditive 

d{XVY\Q)<d{X\Q) + d{Y\Q). (subadditivity) (8) 

in general; a result of the terms in the sum rule (3), which avoid double-counting the 
overlap between the two questions [3, 4]. Commutativity of the join requires that 

J(XiVX2V---VX„|e)=J(X;,(l)VX;,(2)V---VX;,(„)|e) (symmetry) (9) 

for all permutations (;r(l), ;r(2) ■ ■ ■ , n{n)) of (1,2, • • • Thus the relevance must be 
symmetric with respect to the order of the joins. 

Last, since any assertion /, known to be false can be identified with the bottom _L in 
A, its corresponding ideal question F = |/ G J can be identified with ± in Q. Since for 
all questions X e Q it is true that X V _L = X, we have the expansibility condition 

d{Xi VX2 V • • • VXnVF\Q) = d{Xi VX2 V • • • VX„\Q). (expansibility) (10) 

I now define a partition question as a real question where its set of answers are neatly 
partitioned. More specifically 

Definition 2 (Partition Question) A partition question is a real question P G % formed 
from the join of a set of ideal questions P = V/Li^/ where y Xj,Xk G 3{Q), Xj AX]^ = ± 
when j 7^ k. 

For a partition question P, the degree to which the top question T answers P can be 
easily written using (7) 

diP\T) = diy Xi\T) = j^HipixilT)). (11) 

(=1 (=1 

An important result from Aczel et al. [11] states that if a function of this form satisfies 
additivity (7), subadditivity (8), symmetry (9), and expansibility (10), then the unique 
form of the function is a linear combination of the Shannon and Hartley entropies 



d{P\T) =aHm{pi,P2,--- ,Pn)+b oHm{Pl,P2,--- ,Pn), 



(12) 



where pi = p{xi\T), a,bscicQ arbitrary non-negative constants. The Shannon entropy [12] 
is defined as 

n 

Hm{P\,P2, - ■ ■ ,Pn) = - Y,Pi^Og2Ph (13) 

and the Hartley entropy [13] is defined as 

oHUPhP2r--,Pn)=log2N{P), (14) 

where N{P) is the number of non-zero arguments pi. An additional condition suggested 
by Aczel states that the Shannon entropy is the unique solution if the result is to be small 
for small probabilities [11]; that is, the relevance varies continuously as a function of the 
probability. This result is important since it rules out the use of other types of entropy, 
such as the Renyi and Tsallis entropies, for the purposes of inference and inquiry. 
Given these results, the relevance of an ideal question (6) can be written as 

d{X\T) = -ap{x\T)\og2p{x\T), (15) 

which is proportional to the probability-weighted surprise. The sum rule allows us to 
calculate more complex relevances, such as that of the central issue 

d{A\/KWN\T) oc ~Pal0g2Pa- Pk^Og2Pk~ P,Aog2Pn, (16) 

where pa = p(a|T), • • • , and we have set the arbitrary constant a — I. 

With the relevances of the join-irreducible questions defined, the inquiry calculus 
allows us to compute the relevance between any two questions. The degree to which 
an arbitrary question Q answers a question X can be found from d{X\T) by recognizing 
that d{X\Q) — d{X\Q AT) and using Bayes' Theorem. Furthermore, the relevance of Q 
to the join of two questions such as AN U KN = AN V KN is 

d{ANyKN\Q)=d{AN\Q)+d{KN\Q)-d{ANAKN\Q), (17) 

which is clearly related to the mutual information, although the conditionality of this 
measure absent in the information-theoretic notation. Thus the relevance of the join of 
two questions is akin to mutual information, which describes what the two questions ask 
in common. Similarly, the relevance of the meet of two questions d{AN AKN\Q) is akin 
to the joint entropy. In the context of information theory. Cox's choice in naming the 
common question and joint question is very satisfying. 

The inquiry calculus holds new possibilities. Not only does it allow for conditionality, 
which is obscured and implicit in information theory, but the relevance of questions 
comprised of the joins of multiple questions can be computed using the sum rule, which 
proves to be the generalized entropy conjectured by Cox [7, 9]. Furthermore, special 
cases of these relevances have appeared before in the literature [4]. The result presented 
here is a well-founded generalization of information theory, where the relationships 
among a set of any number of questions can be quantified. 

Last, it should be noted that by setting a — bm (12), and using (13), (14), we get 

n p. 

Hm{phP2,--- ,Pn) = -l^P?log2T' (1^) 

i=l n 

which is the relative entropy based on a uniform measure. 



MAXIMUM ENTROPY 



This result provides new insights into the assumptions underlying the Principle of 
Maximum Entropy [14, 15]. Consider both the assertion lattice A and its dual the 
question lattice Q. What does it mean to assign probabilities to A by maximizing the 
entropy? When we 'maximize the entropy', we are actually maximizing the relevance 
of the top question T to the central issue /, i.e. we maximize d{I\T). This says that 
we are setting up the probability assignments so that the question that asks everything is 
maximally relevant to the central issue. To understand what this means, it is useful to see 
what happens in a special case. In the situation where we have no constraints, this results 
in assigning uniform prior probabilities to the join-irreducible elements of A. What if in 
the case of three statements, we assign the probabilities non-uniformly: p{xi\T) — 0.1, 
/'(-^2|T) = 0.4, p{xt,\T) = 0.5. In this case, the central issue no longer has the maximal 
relevance. Instead, the question defined by the set X1X2 VX3 = (jci Vjc2,jci ,jc2,jc3, _L} has 
the maximal relevance. This suggests that a re-parametrization of the problem is more 
relevant {xi \/X2,X3}. Thus when we assign priors based on maximizing the entropy, 
we are relying on the fact that we believe that we have a relevant parametrization of the 
problem. In other words: we have identified the relevant variables. 



DISCUSSION 

Within the last few years there has a been a surge of interest in Bayesian methods. Much 
of this is due to the fact that Bayesian methods work, and work well. However, with 
this surge of activity the ideas of Jaynes and Cox are slowly being lost as converted 
statisticians focus more and more on mathematical rigor and less on the basic concepts. 
Ironically, it is this focus on mathematical rigor and loss of the basic concepts that buried 
Bayesian methods in the 19th century. Cox's realization that Bayesian probability theory 
is the only theory consistent with Boolean logic is key, since it rules out all other possible 
theories of inference. Jaynes' realization that the entropy of statistical mechanics is 
related to Shannon's entropy, and that one can use it to assign priors is crucial since 
it ties together the physics of thermodynamics to inference. However, the successful 
application of inference in several key areas of physics seems to be of little interest to 
statisticians, which is puzzling given both the great success of the theories and the great 
mysteries that they simultaneously resolve and reveal. 

The basic concepts are key, because it is by fully understanding these concepts that 
we can generalize these ideas to form new theories. I have found that Cox's idea 
of introducing a real number representing a degree of belief can be generalized to 
introducing a real-valued function generalizing the zeta function of a lattice. This allows 
one to take any ordered set that forms a lattice and introduce a measure describing the 
degree of inclusion associated with that ordering relation. 

Cox's definition of a question is the definition of a down-set in order theory. With 
this definition in hand, I showed that the ordered set of down-sets of assertions gives 
rise to the set of all possible questions, which forms a distributive lattice. Realizing 
that Caticha's results on quantum mechanical experimental setups demonstrate that sum 



and product rules are associated with distributive lattices, I showed that the calculus 
of inquiry has sum and product rules analogous to the inferential calculus. This paper 
extends these results by requiring consistency between the measures assigned to the 
lattice of assertions and the lattice of questions. With yet another important result by 
Aczel, I have shown that the relevance measure on the lattice of questions is based on 
the Shannon entropy. This is significant since it rules out the use of other entropies in 
inference (eg. Renyi entropy and Tsallis entropy), as well as inquiry. The result is that 
the inquiry calculus and the relevance measure is a natural generalization of information 
theory. Furthermore, these results provide a new perspective on the role of Maximum 
Entropy in prior probability assignment. 
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